Add MLP & QLoRA Fused Ops and Kernels, Mixtral by fabianlim · Pull Request #29 · foundation-model-stack/fms-acceleration

fabianlim · 2024-05-30T08:45:35Z

Completing more items in #25 .

decided to remove the L40 benchmarks.

Verified that we can reproduce the roughly 20% speedups using fused-ops and kernels

these are per device throughputs, so for two gpus we should multiply by 2 to get the actual througput

Verified that we are reproduce the 75% in memory reduction using 4bit base weights

also with FSDP when using two gpus, we see another 50% memory reduction

Signed-off-by: Yu Chin Fabian Lim <flim@sg.ibm.com>

fabianlim · 2024-05-30T15:43:34Z

running a set of benches now. will merge after complete

…l-stack#14

fabianlim · 2024-06-06T14:53:47Z

@achew010 pls update if you have obtained the new benches.

fabianlim added 7 commits May 30, 2024 16:43

refactor

073e23b

Signed-off-by: Yu Chin Fabian Lim <flim@sg.ibm.com>

fixes

4fa64fe

Signed-off-by: Yu Chin Fabian Lim <flim@sg.ibm.com>

refactor mistral

fb81605

Signed-off-by: Yu Chin Fabian Lim <flim@sg.ibm.com>

add mixtral

0fd8b0e

Signed-off-by: Yu Chin Fabian Lim <flim@sg.ibm.com>

some refactoring after introducing mlp

0e41679

Signed-off-by: Yu Chin Fabian Lim <flim@sg.ibm.com>

remove extranous files

3f03ce4

Signed-off-by: Yu Chin Fabian Lim <flim@sg.ibm.com>

add bnb

97f013c

Signed-off-by: Yu Chin Fabian Lim <flim@sg.ibm.com>

fabianlim requested a review from achew010 May 30, 2024 08:45

fabianlim self-assigned this May 30, 2024

fabianlim mentioned this pull request May 30, 2024

Initial Addition of FusedOps and Kernels Plugin With Model Patcher #25

Merged

7 tasks

fabianlim changed the title ~~Add MLP Fused Ops and Kernels, Mixtral~~ Add MLP Fused Ops and Kernels, Mixtral, QLoRA Kernels May 30, 2024

fabianlim changed the title ~~Add MLP Fused Ops and Kernels, Mixtral, QLoRA Kernels~~ Add MLP & QLoRA Fused Ops and Kernels, Mixtral May 30, 2024

achew010 approved these changes May 30, 2024

View reviewed changes

fabianlim force-pushed the fix-foak-final branch 2 times, most recently from 2617d8c to fa50cf2 Compare May 30, 2024 11:50

lint + fmt and improvements to readme

a308a61

Signed-off-by: Yu Chin Fabian Lim <flim@sg.ibm.com>

fabianlim force-pushed the fix-foak-final branch from fa50cf2 to a308a61 Compare May 30, 2024 11:51

fabianlim added 2 commits May 31, 2024 15:22

bench fixes

435d685

need to handle lora adapters device due to foundation-model-stack#26

f666d5e

fabianlim mentioned this pull request May 31, 2024

Failure in FSDP Benchmark Experiment using QLoRA with Custom Fused Modules #3

Closed

fabianlim added 2 commits June 1, 2024 03:27

allow replay of failed benches, addressing comment in foundation-mode…

77bc92b

…l-stack#14

update benches (remove l40)

2f98563

fabianlim merged commit 8103238 into foundation-model-stack:dev Jun 2, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add MLP & QLoRA Fused Ops and Kernels, Mixtral#29

Add MLP & QLoRA Fused Ops and Kernels, Mixtral#29
fabianlim merged 12 commits intofoundation-model-stack:devfrom
fabianlim:fix-foak-final

fabianlim commented May 30, 2024 •

edited

Loading

Uh oh!

fabianlim commented May 30, 2024

Uh oh!

fabianlim commented Jun 6, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

fabianlim commented May 30, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

fabianlim commented May 30, 2024

Uh oh!

fabianlim commented Jun 6, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

fabianlim commented May 30, 2024 •

edited

Loading